Final Project Spring Christopher Rabeony ========================================================

Spotify

Background Information: Spotify is a Swedish audio streaming platform that provides music and podcasts from record labels and media companies.


Introduction to my Project

My data source for this project will be extracted from the ‘Spotify Top 200’ weekly streaming data. The url for this website is https://spotifycharts.com/regional/.

Six Different Countries

Spotify not only includes streaming information from the United States, but from other countries as well. For this project we will include data from the United States, Argentina, Bolivia, the United Kingdom, Belgium, and Australia.

For a lot of our data extraction and cleaning for each of our countries I will be using functions to cut out a lot of unnecessary code.

Starting with the United States top 200

Specify and read in the URL. Extract the table node and select the first table as a data frame.

USA <- get_tbl("https://spotifycharts.com/regional/us/weekly/latest")
head(USA, 3)
##   Var.1 Var.2 Var.3
## 1    NA     1    NA
## 2    NA     2    NA
## 3    NA     3    NA
##                                                                                                                          Track
## 1                           RAPSTAR\n                                                                                by Polo G
## 2        Kiss Me More (feat. SZA)\n                                                                                by Doja Cat
## 3 MONTERO (Call Me By Your Name)\n                                                                                by Lil Nas X
##      Streams
## 1 12,305,863
## 2 11,112,936
## 3  9,994,317

Cleaning up U.S top 200

Because tables imported from webpages usually need cleaning up.

spotify_USA <- clean_df(USA)
spotify_USA$Code <- str_replace_all(spotify_USA$Code, "\\d{1,3}", "NA")
head(spotify_USA, 4)
##   Code
## 1   NA
## 2   NA
## 3   NA
## 4   NA
##                                                                                                                   Track
## 1                                RAPSTAR                                                                               
## 2               Kiss Me More (feat. SZA)                                                                               
## 3         MONTERO (Call Me By Your Name)                                                                               
## 4 Peaches (feat. Daniel Caesar & Giveon)                                                                               
##          Artist  Streams
## 1        Polo G 12305863
## 2      Doja Cat 11112936
## 3     Lil Nas X  9994317
## 4 Justin Bieber  8697957

The data we extracted above lists the top 200 songs streamed for a given country. We are given the name of each song, the artist involved in its creation, the number of total streams, and its rank in the top 200. I created a column called “Code” which displays the continent that each our data table represents. For the USA, table we use “NA” (North America)

International information.

Now to include the information of the other five countries in the same format.

Argentina (South America)

head(spotify_AR, 2)
##   Code
## 1   SA
## 2   SA
##                                                                                                                 Track
## 1                                 Fiel                                                                               
## 2 L-Gante: Bzrp Music Sessions, Vol.38                                                                               
##                                Artist Streams
## 1 Los Legendarios, Wisin, Jhay Cortez 2118367
## 2                   Bizarrap, L-Gante 1932223

Bolivia (South America)

head(spotify_BO, 2)
##   Code
## 1   SA
## 2   SA
##                                                                                                 Track
## 1 Botella Tras Botella                                                                               
## 2       Pareja Del Año                                                                               
##                         Artist Streams
## 1     Gera MX, Christian Nodal  403616
## 2 Sebastian Yatra, Myke Towers  261131

======================================================== United Kingdom (Europe)

head(spotify_UK, 2)
##   Code
## 1   EU
## 2   EU
##                                                                                                                   Track
## 1         MONTERO (Call Me By Your Name)                                                                               
## 2 Peaches (feat. Daniel Caesar & Giveon)                                                                               
##          Artist Streams
## 1     Lil Nas X 3460160
## 2 Justin Bieber 2552651
BE <- get_tbl("https://spotifycharts.com/regional/be/weekly/latest")
spotify_BE <- clean_df(BE)
spotify_BE$Code <- str_replace_all(spotify_BE$Code, "\\d{1,3}", "EU")

Belgium (Europe)

head(spotify_BE, 2)
##   Code
## 1   EU
## 2   EU
##                                                                                                                               Track
## 1                     MONTERO (Call Me By Your Name)                                                                               
## 2 Friday (feat. Mufasa & Hypeman) - Dopamine Re-Edit                                                                               
##                 Artist Streams
## 1            Lil Nas X  519558
## 2 Riton, Nightcrawlers  398628

======================================================== Finally Australia (Australia)

head(spotify_AU, 4)
##   Code
## 1  AUS
## 2  AUS
## 3  AUS
## 4  AUS
##                                                                                                                   Track
## 1         MONTERO (Call Me By Your Name)                                                                               
## 2 Peaches (feat. Daniel Caesar & Giveon)                                                                               
## 3                             Heat Waves                                                                               
## 4               Kiss Me More (feat. SZA)                                                                               
##          Artist Streams
## 1     Lil Nas X 1626491
## 2 Justin Bieber 1513052
## 3 Glass Animals 1486195
## 4      Doja Cat 1443893


Next Step: Data Manipulation and Representation

With our datasets now imported and cleaned we can now manipulate our data to reveal new information.

Creating a dataframe that represents Global Streams

spotify_Global <- bind_rows(spotify_EU, spotify_NA, spotify_AUS, spotify_SA) %>%
  arrange(desc(Streams)) %>%
  group_by(Code)
spotify_Global
## # A tibble: 993 x 4
## # Groups:   Code [4]
##    Code  Track                                    Artist                 Streams
##    <chr> <chr>                                    <chr>                    <dbl>
##  1 NA    "RAPSTAR                               … Polo G                  1.23e7
##  2 NA    "Kiss Me More (feat. SZA)              … Doja Cat                1.11e7
##  3 NA    "MONTERO (Call Me By Your Name)        … Lil Nas X               9.99e6
##  4 NA    "Peaches (feat. Daniel Caesar & Giveon)… Justin Bieber           8.70e6
##  5 NA    "Save Your Tears (with Ariana Grande) (… The Weeknd              7.80e6
##  6 NA    "Levitating (feat. DaBaby)             … Dua Lipa                7.68e6
##  7 NA    "deja vu                               … Olivia Rodrigo          7.40e6
##  8 NA    "Astronaut In The Ocean                … Masked Wolf             6.28e6
##  9 NA    "Heartbreak Anniversary                … Giveon                  5.92e6
## 10 NA    "Leave The Door Open                   … Bruno Mars, Anderson …  5.69e6
## # … with 983 more rows

One data frame can provide a plethora of information and can provide answers a lot of questions.


Representing the Overall Number of Spotify listeners across each Continent.

continentStreams 

Why is streaming so much higher in North America compared to Europe, even though Spotify was founded in Sweden, and released first in the United Kingdom?

Distribution of Total Streams.

Let’s visualize the distribution of total streams for the last week.

streamDist
## # A tibble: 4 x 2
##   Code  totalStreams
##   <chr>        <dbl>
## 1 NA       485414548
## 2 EU       153489363
## 3 SA        95146044
## 4 AUS       77250137

Let’s create a pie chart to view how each coninent’s streaming numbers compare to the other.

How are the total number of global streams distributed to each continent?

Stream_circle

Find the top 3 songs in each continent.

spotify_topSongs <- spotify_Global %>%
  arrange(desc(Streams)) %>%
  group_by(Code) %>% slice(1:3)
eachCountry

As we can see, there’s a lot of crossover when it comes to artists and their international audiences.

Finding and Analyzing Aggregate Data

Now that we have looked at streaming information for each continent closely, let’s look at our overall Global information.

spotify_globalSongs
## # A tibble: 699 x 3
## # Groups:   Track [693]
##    Track                                         Artist                  Streams
##    <chr>                                         <chr>                     <dbl>
##  1 "MONTERO (Call Me By Your Name)             … Lil Nas X                1.64e7
##  2 "RAPSTAR                                    … Polo G                   1.64e7
##  3 "Kiss Me More (feat. SZA)                   … Doja Cat                 1.53e7
##  4 "Peaches (feat. Daniel Caesar & Giveon)     … Justin Bieber            1.42e7
##  5 "Levitating (feat. DaBaby)                  … Dua Lipa                 1.10e7
##  6 "deja vu                                    … Olivia Rodrigo           1.07e7
##  7 "Save Your Tears (with Ariana Grande) (Remix… The Weeknd               1.05e7
##  8 "Astronaut In The Ocean                     … Masked Wolf              9.40e6
##  9 "Leave The Door Open                        … Bruno Mars, Anderson .…  8.20e6
## 10 "Heartbreak Anniversary                     … Giveon                   8.12e6
## # … with 689 more rows

Lets create a graphic representation of this data frame above. Graphing the overall top 200 songs streamed on the Spotify website.

Finding and Analyzing Aggregate Data

popularSongs
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Blues is 9
## Returning the palette you asked for with that many colors

The ‘spotifyr’ Package

Another method I would like to integrate into my project is the use of the Spotify Developer Tools Web API. The method to retrieving an API key is free, and simple once an account is made.

library(spotifyr)

The spotifyr package pulls a variety of audio features from Spotify’s Web Api. Once we obtain the web key, and authorization we can retrieve a variety of information in seconds.

spotify_client_id <- source("/Users/chris/Documents/DataWranglingHusbandry/DataWranglingFinalProject/api-keysSpotify.R")
Sys.setenv(SPOTIFY_CLIENT_ID = api.key.spotify)
Sys.setenv(SPOTIFY_CLIENT_SECRET = api.spotify.clientID)
access_token <- get_spotify_access_token()

Now let’s go back to the data that contained the most streamed songs globally.

head(spotify_globalSongs)
## # A tibble: 6 x 3
## # Groups:   Track [6]
##   Track                                                     Artist       Streams
##   <chr>                                                     <chr>          <dbl>
## 1 "MONTERO (Call Me By Your Name)                         … Lil Nas X     1.64e7
## 2 "RAPSTAR                                                … Polo G        1.64e7
## 3 "Kiss Me More (feat. SZA)                               … Doja Cat      1.53e7
## 4 "Peaches (feat. Daniel Caesar & Giveon)                 … Justin Bieb…  1.42e7
## 5 "Levitating (feat. DaBaby)                              … Dua Lipa      1.10e7
## 6 "deja vu                                                … Olivia Rodr…  1.07e7

Lets take the top 5 artists on this list and find out if there’s something that in their music that make their songs the most popular in the world.

Find the audio track information for each song in the top 50.

We will first take the top 50 most streamed songs in the world.

Import our data using the ‘search_spotify’ function to retrieve more detailed information for each track.

artist_audio_features <- map_df(spotifyTop50, function(artist) {
    search_spotify(artist, "track") %>%
    mutate(artist_name = artist)
})
spotifytopInformation <- spotifyFilter1 %>% group_by(artist_name) %>% arrange(desc(popularity)) %>% slice(1)
head(spotifytopInformation)
## # A tibble: 6 x 5
## # Groups:   artist_name [6]
##   artist_name          id          name              popularity album.release_d…
##   <chr>                <chr>       <chr>                  <int> <date>          
## 1 24kGoldn             4jPy3l0RUw… Mood (feat. iann…         90 2021-03-26      
## 2 Ariana Grande        37BZB0z9T8… Save Your Tears …         90 2021-04-23      
## 3 AURORA               3Z0oQ8r78O… Into the Unknown          76 2019-11-15      
## 4 Bad Bunny, Jhay Cor… 47EiUVwUp4… DÁKITI                    90 2020-10-30      
## 5 Billie Eilish        54bFM56PmE… Therefore I Am            88 2020-11-12      
## 6 Bruno Mars, Anderso… 7MAibcTli4… Leave The Door O…         96 2021-03-05

Above is our most popular songs, by our top artists.

Find the audio track freatures for each song in the top 50.

spotifytrackInfo <- spotifytopInformation$id
spotifytrackFeatures <- get_track_audio_features(spotifytrackInfo)
spotifytrackAnalysis <- get_tracks(spotifytrackInfo) %>% select(9,7,10)
head(trackInformation, 2)
##                                           name                     id
## 1                       Mood (feat. iann dior) 4jPy3l0RUwlUI9T5XHBW2m
## 2 Save Your Tears (with Ariana Grande) (Remix) 37BZB0z9T8Xu7U3e65qxFy
##   popularity danceability energy key loudness mode speechiness acousticness
## 1         90        0.701  0.716   7   -3.671    0      0.0361       0.1740
## 2         90        0.650  0.825   0   -4.645    1      0.0325       0.0215
##   instrumentalness liveness valence   tempo           type
## 1         0.00e+00   0.3240   0.732  91.007 audio_features
## 2         2.44e-05   0.0936   0.593 118.091 audio_features
##                                    uri
## 1 spotify:track:4jPy3l0RUwlUI9T5XHBW2m
## 2 spotify:track:37BZB0z9T8Xu7U3e65qxFy
##                                                 track_href
## 1 https://api.spotify.com/v1/tracks/4jPy3l0RUwlUI9T5XHBW2m
## 2 https://api.spotify.com/v1/tracks/37BZB0z9T8Xu7U3e65qxFy
##                                                       analysis_url duration_ms
## 1 https://api.spotify.com/v1/audio-analysis/4jPy3l0RUwlUI9T5XHBW2m      140533
## 2 https://api.spotify.com/v1/audio-analysis/37BZB0z9T8Xu7U3e65qxFy      191014
##   time_signature
## 1              4
## 2              4

The Spotify for Developers App does a great job analyzing the musical characteristics for each and every song. These features inclue a songs, “danceability”, “tempo”, “liveliness”, “energy”, and its use of “acoustics”

Is there any relation between song popularity and characteristics?

I want to create a linear model that might be able to find any strong correlation between these key characteristics and how these musical tracks will be received by the general public.

topSongs.lm <- lm(formula = popularity ~ acousticness + liveness + energy + valence + loudness + tempo, data = trackInformation)
summary(topSongs.lm)
## 
## Call:
## lm(formula = popularity ~ acousticness + liveness + energy + 
##     valence + loudness + tempo, data = trackInformation)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.3924  -2.5995   0.3427   2.9475   9.4733 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  90.42218    6.41954  14.085  < 2e-16 ***
## acousticness  2.60360    3.42400   0.760  0.45021    
## liveness     17.17105    5.60825   3.062  0.00338 ** 
## energy        6.54156    6.52208   1.003  0.32018    
## valence       0.61651    2.86165   0.215  0.83021    
## loudness      0.28017    0.37648   0.744  0.45988    
## tempo        -0.05539    0.02436  -2.273  0.02685 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.639 on 56 degrees of freedom
## Multiple R-squared:  0.3242, Adjusted R-squared:  0.2518 
## F-statistic: 4.478 on 6 and 56 DF,  p-value: 0.0009048

Based on the information I’ve presented. There really isn’t a conclusion that can be properly drawn. There doesn’t seem to be any correlation between the popularity of any given song, and its features.

Graphing Representation

trackInfo <- gather(trackInformation, 'danceability':'tempo', key = 'characteristic', value = 'value')
ggplot(trackInfo, aes(value, popularity)) + geom_point() + facet_wrap(~characteristic, ncol = 5, scales = "free_x")

The top 10 most streamed artists in the world.

head(spotify_globalArtists, 10)
## # A tibble: 10 x 2
##    Artist          Streams
##    <chr>             <dbl>
##  1 Justin Bieber  29802117
##  2 Polo G         24672052
##  3 The Weeknd     23630771
##  4 Doja Cat       21550705
##  5 Dua Lipa       19356819
##  6 Olivia Rodrigo 18246898
##  7 Juice WRLD     18018010
##  8 Lil Nas X      17989509
##  9 Drake          17702858
## 10 Pop Smoke      13205093
mostStreamed

A Successful Discography

While an artist might have a massive amount of streams, this doesn’t mean they have a successful music career overall. Having multiple songs in the top 200 could be seen as higher benchmark for success.

Counting how many songs an artist has in the global top 200.

spotify_globalAppearances
## # A tibble: 514 x 2
## # Groups:   Artist [514]
##    Artist              n
##    <chr>           <int>
##  1 Damso              15
##  2 Christian Nodal    10
##  3 Duki                8
##  4 Morgan Wallen       8
##  5 Juice WRLD          7
##  6 Bad Bunny           6
##  7 Ed Sheeran          6
##  8 Justin Bieber       6
##  9 AJ Tracey           5
## 10 Camilo              5
## # … with 504 more rows

Create a Word Cloud

spotify_globalAppearances %>%
with(wordcloud(words = Artist, n, max.words = 30, random.order = FALSE, colors = brewer.pal(8, "Dark2")))

Take note that Lil Nas X the creator of “Old Town Road”, is not even in the top 10 for amount of songs he has in the top 200. Even though he has dominated the streaming numbers. We call this type of phenomenon One-Hit Wonder.

Graph on Billie Eilish (North American - Pop)

BillieEilish_Energy <- getEnergy_graph(Billie_Eilish)
BillieEilish_Energy

Let’s create a graph that will compare the energy (intensity) and valence (emotion) for each of our artists.

Graph on PNL (European - French Rap)

PNL_Energy <- getEnergy_graph(PNL)
PNL_Energy

Graph on SchoolBoy Q (North American - Rap/Hip-Hop)

SchoolBoy_Energy <- getEnergy_graph(SchoolBoy_Q)
SchoolBoy_Energy

Graph on Sebastian Yatra (South American - Latin/Reggaeton)

Sebastian_Energy <- getEnergy_graph(Sebastian_Yatra)
Sebastian_Energy

Graph on Post Malone (American - Hip Hop/Pop)

PostMalone_Energy <- getEnergy_graph(Post_Malone)
PostMalone_Energy

Graph on on Khalid (American - R&B)

Khalid_Energy <- getEnergy_graph(Khalid)
Khalid_Energy

Concluding Thoughts

We can somewhat make a conclusion that some of the most popular music in this day and age, are low intensity sounds, with very dark material.

However the biggest point I want to make is: Music tastes aren’t objective.

A lot of our enjoyment in music comes from our socioeconomic backgrounds, how our environment has influenced us, and what’s readily available for us to listen to.